ON THE SPECTRAL NORM OF A RANDOM TOEPLITZ MATRIX 



MARK W. MECKES 

Abstract. Suppose that T„ is a Toeplitz matrix whose entries come from a sequence of inde- 
pendent but not necessarily identically distributed random variables with mean zero. Under some 
additional tail conditions, we show that the spectral norm of T n is of the order y/n\ogn. The same 
result holds for random Hankel matrices as well as other variants of random Toeplitz matrices which 
have been studied in the literature. 



1. Introduction and results 

Let Xq, Xi, X2, ■ ■ ■ be a family of independent random variables. For n > 2, T n denotes the 
n x n random symmetric Toeplitz matrix T n = \X\j-h\[ k<n -> 



r ^0 


Xx 


x 2 ■■ 


■ -Xn-2 


X n -\ 




x 


X! 




X n -2 


x 2 


X X 


X 






X n -2 






x 


Xi 


Xn-\ 


X n -2 




.. x x 


X 



In [T], Bai asked whether the spectral measure of n -1 / 2 T n approaches a deterministic limit measure 
fx as n — > 00. Bryc, Dembo, and Jiang [5] and Hammond and Miller [8] independently proved that 
this is so when the Xj are identically distributed with variance 1, and that with these assumptions 
[i does not depend on the distribution of the Xj . The measure \i does not appear to be a previously 
studied probability measure, and is described via rather complicated expressions for its moments. 

This limiting spectral measure \i has unbounded support, which raises the question of the as- 
ymptotic behavior of the spectral norm ||T n ||, i.e., the maximum absolute value of an eigenvalue of 
T n . (This problem is explicitly raised in [5l Remark 1.3].) This paper shows, under slightly different 
assumptions from [5l [8], that ||T n || is of the order \/n\ogn. Here the Xj need not be identically 
distributed, but satisfy stronger moment or tail conditions than in 0(8]. The spectral norm is also 
of the same order for other related random matrix ensembles, including random Hankel matrices. 
In the case of Hankel matrices, Theorems [T] and [3] below generalize in a different direction a special 
case of a result of Masri and Tonge [8] on multilinear Hankel forms with ±1 Bernoulli entries. 

A random variable X will be called subgaussian if 
(1) F[\X\ >t}< 2e' at2 \/t>0 

for some constant a > 0. A family of random variables is uniformly subgaussian if each satisfies (pQ) 
for the same constant a. 

Theorem 1. Suppose Xq, X\, X 2 , ■ ■ ■ are independent, uniformly subgaussian random variables 
with EX,- = for all j . Then 



E[|T n [| < c\\Jn log 



where c\ > depends only on the constant a in the subgaussian estimate ([T]) for the Xj. 
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Simple scaling considerations show that one can take c\ = Ca^ 1 ^ 2 for some absolute constant 
C > 0. In principle an explicit value for C can be extracted from the proof of Theorem [TJ No 
attempt has been made to do so, since the techniques used in this paper are suited for determining 
rough orders of growth, and not precise constants. Similar remarks apply to the constants which 
appear in the statements of Theorems [2] and [3] below. 

By strengthening the subgaussian assumption, the statement of Theorem[T]can be improved from 
a bound on expectations to an almost sure asymptotic bound. Recall that a real-valued random 
variable X (or more properly, its distribution) is said to satisfy a logarithmic Sobolev inequality 
with constant A if 

E[/ 2 (X)log/ 2 (X)] <2^E[/'(X) 2 ] 

for every smooth / : R — > R such that Ef 2 (X) = 1. Standard normal random variables satisfy a 
logarithmic Sobolev inequality with constant 1. Furthermore, it is well known that independent 
random variables with bounded logarithmic Sobolev constants are uniformly subgaussian and pos- 
sess the same concentration properties as independent normal random variables (see [11] or \12\ 
Chapter 5]). 

Theorem 2. Suppose Xo, X\, X2, ■ ■ ■ are independent, EXj = for all j, and for some constant 
A, either: 

(i) for all j, \Xj\ < A almost surely; or 

(ii) for all j, Xj satisfies a logarithmic Sobolev inequality with constant A. 
Then 

\\T n \\ 

lim sup n < C2 
»woo \/nlogn 

almost surely, where C2 > depends only on A. 

We remark that according to the definition used here, T n is a submatrix of T n+ %, but this is only 
a matter of convenience in notation. Theorem [2] remains true regardless of the dependence among 
the random matrices T n for different values of n. 

It seems unlikely that the stronger hypotheses of Theorem [2] are necessary. In fact a weaker 
version can be proved under the hypotheses of Theorem [T] alone; see the remarks following the 
proof of Theorem [2] in Section [2j 

When the Xj have variance 1, the upper bound \J n log n of Theorems [T] and [2] is of the correct 
order. In fact the matching lower bound holds under less restrictive tail assumptions, as the next 
result shows. 

Theorem 3. Suppose Xo, Xi,X%, . 

EXj 

Then 

where C3 > depends only on B. 

In the case that EX 2 = 1 and E|X,-| 3 < 00, it is a consequence of Holder's inequality that 
E|X,-| > (ElXjl 3 )^ 1 . Thus the lower bound on first absolute moments assumed in Theorem [3] is 
weaker than an upper bound on absolute third moments, and is in particular satisfied for uniformly 
subgaussian random variables. 

Section [2] below contains the proofs of Theorems [THSlAs mentioned above, Theorems [THS] also 
hold for other ensembles of random Toeplitz matrices, as well as for random Hankel matrices. 
Section [3] discusses these extensions of the theorems and makes some additional remarks. 



. . are independent and for some constant B, each Xj satisfies 
= 0, EX| = 1, E\Xj\>B. 

E||T n || > c 3 \/n logn, 
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2. Proofs 

The proof of Theorem[T]is based on Dudley's entropy bound [6] for the supremum of a subgaussian 
random process. Given a random process {Y x : x G M}, a pseudometric on M may be defined by 



E|Y X - Y v \ 2 . 



d(x,y) = 

The process {Y x : x G M} is called subgaussian if 

(2) Vx, y G M, Vt > 0, P[|Y X - K y | > t] < 2exp 



6t 2 



for some constant 6 > 0. For e > 0, the e-covering number of (M,d), N(M,d,e), is the smallest 
cardinality of a subset NcM such that 

\/x G M 3y G N : d(a;,y) < e. 

Dudley's entropy bound is the following (see |18t Proposition 2.1] for the version given here). 

Proposition 4. Let {Y x : x G M} be a subgaussian random process with KY X = for every x G M. 
Then 

roo 

E sup \Y X \ <K y/\og N(M, d, 7) de, 

x£M JO 

where K > depends only on the constant b in the subgaussian estimate ([2]) for the process. 

We will also need the following version of the classical Azuma-Hoeffding inequality. This can be 
proved by a standard Laplace transform argument; see e.g. |13^ Fact 2.1]. 

Proposition 5. Let X\, . . . ,X n be independent, symmetric, uniformly subgaussian random vari- 
ables. Then for any oi, . . . ,a n G R and t > 0, 



> t 



< 2exp 



bt 2 



where b > depends only on the constant a in the subgaussian estimate (pQ) for the Xj. 



Proof of Theorem U\ We first reduce to the case in which each Xj is symmetric. Let T' n be an 
independent copy of T n . Since ET ra = 0, by Jensen's inequality, 

E||T„|| < E[E[||T n - r^|||r n ]] = E||T„ - T' n \\. 

The random Toeplitz matrix (T n — T^) has entries (Xj — X'-) which are independent, symmetric, 
uniformly subgaussian random variables (with a possibly smaller constant a in the subgaussian 
estimate). Thus we may assume without loss of generality that the Xj are symmetric random 
variables. 

We next bound \\T n \\ by the supremum of a subgaussian random process. A basic feature of 
the theory of Toeplitz matrices is their relationship to multiplication operators (cf . [H Chapter 1] ) . 
Specifically, the finite Toeplitz matrix T n is an n x n submatrix of the infinite Laurent matrix 



L, 



i X \: 



\j-k\l\j-k\<n-x\ j,keZ' 
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Consider L n as an operator on £ 2 (Z) in the canonical way, and let ip : l 2 (T) — > L 2 [0, 1] denote the 
usual trigonometric isometry ij)(e.j){x) = e 2m]X . Then tpLntp -1 : L 2 — > L? is the multiplication 
operator corresponding to the L°° function 

n— 1 n— 1 

/(x) = X b -|e 2 «^ = X + 2 cos(2tt jx)Xj . 

i=-(n-l) j=l 

Therefore 

(3) ||T n || < ||L n || = ll/IU = sup |y x |, 

0<x<l 



where 



n-l 



y x = X + 2 ^ cos(2vrjx)X i . 

3=1 

By Proposition [5l the random process {Y x : x G [0,1]} becomes subgaussian if M = [0,1] is 
equipped with the pseudometric 



d(x,y) 



n-l 



^2 [cos(2vrjx) - cos(27rjy)] . 
\ 3=1 



Finally, we bound N([0, l],d, e) in order to apply Proposition^ Since | cos t\ < 1 always, it follows 
that d(x,y) < 2^/n and therefore N([0, l],d,e) = 1 if e > 2-y/n. Next, since | coss — cost) < \s — t\, 



d(x, y) < 2tt\x — y\ 

which implies that 



n-l 



» J^J 2 < 4n 3 / 2 |x-y| 

\ 3=1 



'V([».1]^--)£A'([0,1],|-|,^)<^- 



By ©, Proposition SI and the substitution e = 4n 3 / 2 e 



t- 



(4) E||r n || < K [ 2VE ./log f — ") & = 2^2n 3 / 2 K /°° tV^ 2 dt. 

JO V V 6 / J^2\og 2n 

Integration by parts and the classical estimate -J== e - * 2 / 2 < e~ s2 / 2 for s > yield 

/ t 2 e-*/ 2 dt<( S + V27)e- s2 / 2 . 

J s 

Combining the case s = y/2 log 2n of this estimate with ([5]) completes the proof. □ 

The proof of Theorem [2] is based on rather classical measure concentration arguments commonly 
applied to probability in Banach spaces. 

Proof of Theorem^ Denote by Mq the n x n identity matrix, and for m = 1, . . . ,rt — 1 let M m = 
[l\j-k\=m] 1<; . k<n - Then T n can be written as the sum 

n— 1 

of independent random vectors in the finite-dimensional Banach space M n equipped with the spec- 
tral norm. Observe that \\Mj\\ < 2 for every j. 
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Under the assumption (jl]), up to the precise values of constants the estimate 

P[||T n || > E||T n || + t] < e -* 2 /32A 2 n V t > 

follows from any of several standard approaches to concentration of measure (cf. Corollary 1.17, 
Corollary 4.5, or Theorem 7.3 of [12]; the precise statement can be proved from Corollary 1.17). 
Combining this with Theorem 1 yields 

1 



|T„|| > ( Cl + 8A)i/nk^ 



n 



^ — > 

77/ 



which completes the proof via the Borel-Cantelli lemma. 

The proof under the assumption (Jn|) is similar. By the triangle inequality and the Cauchy-Schwarz 
inequality, 



ITJI < 2 



n-1 

«E4 



i=o 

so that the map (Xq, . . . , X n _i) i— > ||T n || has Lipschitz constant bounded by 2^Jn. By the well- 
known tensorization and measure concentration properties of logarithmic Sobolev inequalities (cf. 
PH Sections 2.1-2.3] or [12, Sections 5.1-5.2]), 

P[||T n || > E||T n || + t] < e - t2/4An Vt > 0. 
The proof is completed in the same way as before (with a different dependence of C2 on A) . □ 

As remarked above, a weaker version of Theorem [2] may be proved under the assumptions of 
Theor em Q] alone. Prom the proof of Proposition[5]in [18] one can extract the following tail inequality 
under the assumptions of Proposition |U 



(5) 



sup | Y x \ >t 

.x£M 



/•oo 

< 2e' ct2/a2 Vt > 0, where a = \ y/log N(M, d,e) de. 

Jo 



The explicit statement here is adapted from lecture notes of Rudelson |16j . Using the estimates 
derived in the proof of Theorem [1] and applying the Borel-Cantelli lemma as above, one directly 
obtains 

||r n || 

(6) lim sup p= " < C4 almost surely 

n^oo yjn log n 



under the assumptions that the Xj are symmetric and uniformly subgaussian. The general (non- 
symmetric but mean 0) case can be deduced from the argument for the symmetric case. Let T' n be 
an independent copy of T n . By independence, the triangle inequality, and the tail estimate which 
follows from 

F[\\T' n \\ < s]P[||T n || > s + t] < F[\\T n -rj >t]< 2e~ ct2 l nl °z n 

for some constant c which depends on the subgaussian estimate for the Xj . By Theorem [1] and 
Chebyshev's inequality, 

P[||^|| < s] > 1 - ^ciVnlogn. 
Picking s = 2c\ \/ n log n and t = y^? log n yields 

P[[||T n || >C4v^logn] < -i 

for some constant C4, and © then follows from the Borel-Cantelli lemma. 

The proof of Theorem [3] amounts to an adaptation of the proof of the lower bound in |14j . 
with much of the proof abstracted into a general lower bound for the suprema of certain random 
processes due to Kashin and Tzafriri [91 [10] . The following is a special case of the result of [10] . 



6 



M. MECKES 



Proposition 6. Let cpj : [0, 1] — > R, j = 0, . . . , n— 1 be a family of functions which are orthonormal 
in L 2 [0, 1] and satisfy \\fj\\L 3 [o,i] — A for every j, and let Xq, . . . , X n _i be independent random 
variables such that for every j, 



EXj = 0, EX = 1, E\Xj\ > B. 



Then for any qq, . . . , a n _i 6 R, 



E 



sup 

0<x<l 



n-l 



3=0 



> K 1 1 a 1 1 2 a / lo S 



|Q||2 

|a|U ' 



where \\a\\ 



Y^=o \ a j\ P ) 1 ^ P an d K > depends only on A and B. 



Proof of Theorem First make the estimate 

(T n v, v 



n|| = SUp - 

i>ec™\{o} \ v i v l 



> sup -I (T n v x ,v x ) , 
o<x<i n 



where v x € C n is defined by (v x )j = e 2m ^ x for j = 1, . . . , n and (-, ■) is the standard inner product 
on C n . Therefore 



\T n \\ > - sup 

n 0<x<l 



— sup 

n 0<x<l 



S x li-*[ e 
j,fe=i 

n-l 



2ixi{j— k)x 



b'l)*i;i 



sup 

0<x<l 



sup 

0<a;<l 



J=-(n-l) 

n— 1 ✓ 

X + 2 M - ~ ) Xj cos(2irjx) 

3=1 ^ 

n-l 

j=o 



where we have defined ao = 1, aj = \/2(l — j/n) for j > 1, ipo = 1, and ^(x) = y/2 cos(2-7rjx) for 
j > 1. It is easy to verify that ||o||2 > \/n/2 and ||a||4 < 2n 1//4 . The theorem now follows from 
Proposition [6j □ 



We remark that by combining Theorem [3] with the proof of Theorem [2j one obtains a nontrivial 
bound on the left tail of ||T n || under the assumptions of Theorem [2] and the additional assumption 
that EX? = 1 for every j. Unfortunately, one cannot deduce an almost sure lower bound of the 
form 



\\T n \\ 
lim inf n 
n^oo ^/nlogn 



> c almost surely 



without more precise control over the constants in Proposition [6] and the concentration inequalities 
used in the proof of Theorem [23 



ON THE SPECTRAL NORM OF A RANDOM TOEPLITZ MATRIX 



7 



3. Extensions and additional remarks 

3.1. Other random matrix ensembles. For simplicity Theorems [TH3] were stated and proved 
only for the case of real symmetric Toeplitz matrices. However, straightforward adaptations of the 
proofs show that the theorems hold for other related ensembles of random matrices. These include 
nonsymmetric real Toeplitz matrices [Xj-k\j for independent random variables Xj, j £ Z, as 
well as complex Hermitian or general complex Toeplitz variants. In the complex cases one should 
consider matrix entries of the form Xj = Yj + iZj, where Yj and Zj are independent and each 
satisfy the tail or moment conditions imposed on Xj in the theorems as stated. 

Closely related to the case of nonsymmetric random Toeplitz matrices are random Hankel ma- 
trices H n = [Xj+fc-l] x<j k<n'> wmcn are constant along skew diagonals. This ensemble was also 
mentioned by Bai [I], and was shown to have a universal limiting spectral distribution in [5]. 
Independently, Masri and Tonge [14j considered a random r-linear Hankel form 

n 

. . . ,«p) X h+-+jr( v l)h ■ ■ ■ («r)> 

ii,...,>=o 

in the case ¥[Xj = 1] = P[Xj = — 1] = 1/2, and showed that the expected norm of this form is of 
the order \J ri r ~ l log n. As observed in [5j Remark 1.2], H n has the same singular values, and so in 
particular the same spectral norm, as the (nonsymmetric) Toeplitz matrix obtained by reversing 
the order of the rows of H n . Therefore Theorems HHS] apply to H n as well. As mentioned in the 
introduction, the versions of Theorems [1] and [3] for H n generalize the r = 2 case of the result of p3] 
to subgaussian matrix entries Xj. 

The methods of this paper can also be used to treat random Toeplitz matrices with additional 
restrictions. For example, the theorems apply to the ensemble of symmetric circulant matrices 
considered in [2} Remark 2] which is defined as T n here except for the restriction that X n -j = Xj 
for j = 1, . . . ,n — 1, and the closely related symmetric palindromic Toeplitz matrices considered 
in [15], in which A" n _j_i = Xj for j = 0, . . . ,n — 1. We remark that [15] show that each of 
these ensembles, properly scaled and with some additional assumptions, have a limiting spectral 
distribution which is normal. 

3.2. Weaker hypotheses. It is unclear how necessary the tail or moment conditions on the Xj 
are to the conclusions of the theorems. It appears likely (cf. |19l [3]) that versions of Theorems 
[T] and [2] remain true assuming only the existence of fourth moments, at least when the Xj are 
identically distributed. In particular it is very likely that the assumptions of Theorem [2] can be 
relaxed considerably. Even within the present proof, the assumption of a logarithmic Sobolev 
inequality can be weakened slightly to that of a quadratic transportation cost inequality; cf. |12[ 
Chapter 6]. 

If the Xj have nonzero means then the behavior of \\T n \\ may change. Suppose first that the Xj 
are uniformly subgaussian and EXj = m ^ for every j. If J n denotes the n x n matrix whose 
entries are all 1, then (|6|) implies that 

^ TfiJ 

(7) lim sup — -=- — < c almost surely, 

n-+oo Jniogn 



where c depends on m and the subgaussian estimate for the Xj. Since ||J n || = n, <[7J) and the 
triangle inequality imply a strong law of large numbers: 

\\T n \\ 

(8) lim — — = I m | almost surely. 

n— >oo n 

In [3], flS} was proved using estimates from [5] under the assumption that the Xj are identically 
distributed and have finite variance. We emphasize again that while the methods of this paper 
require stronger tail conditions, we never assume the Xj to be identically distributed. 
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More generally, the behavior of ||T n || depends on the rate of growth of the spectral norms of the 
deterministic Toeplitz matrices ET n . The same argument as above shows that 

||yj| 

lim 7— — = 1 almost surely 

n-^co ||ET n || 

if the random variables (Xj — EXj) are uniformly subgaussian and lim n _^oo npfrjr = 0. On the 
other hand, if ||ET n || = o(\/n log n) then the conclusion of Theorem Q] holds. 

3.3. Random trigonometric polynomials. The supremum of the random trigonometric poly- 
nomial 

n 

Z x = '^2 x j cos(27rjx), 

has been well-studied in the special case P[Xj = 1] = = — 1] = 1/2, in work dating back to 

Salem and Zygmund [17J. Observe that Z x is essentially equivalent to the process Y x defined in 
the proof of Theorem [H and is also closely related to the random process considered in the proof 
of Theorem [3l Halasz [7] proved in particular that 

sup 0<x<1 

\Zx\ , , 

hm ~ = — = 1 almost surely. 

n— >oo \/n\ogn 

From this it follows that when ¥[Xj = 1] = W[Xj = —1] = 1/2 for every j, the conclusion of 
Theorem [2] holds with c-i = 2. Numerical experiments suggest, however, that the optimal value of 
C2 is 1 in this case, and more generally when the Xj are i.i.d. with mean and variance 1. 

Conversely, adaptations of the proofs in this paper yield less numerically precise bounds for the 
supremum of Z x under the same weaker assumptions on the Xj in the statements of the theorems. 
We remark that the techniques used to prove the results of [TDJ Q3] cited above (and hence 
indirectly also Theorem [3]) were adapted from the work of Salem and Zygmund in [T7] . 
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